devtools::install_github("grunwaldlab/metacoder") # do this before trying to knit
library(metacoder)
library(taxa)
##
## Attaching package: 'taxa'
## The following objects are masked from 'package:metacoder':
##
## get_subtaxa, get_taxon_items
## The following object is masked from 'package:stats':
##
## binomial
file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_3 <- extract_taxonomy(names(sequences),
regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
key = c(name = "item_info", seq_id = "item_info",
other_id = "item_info", "class_name"),
database = "none")
Normally, users would not create the class from scratch, since this is rather complex. Instead, the class would be the output of functions like metacoder::extract_taxonomy. unite_ex_data_3 is an example of the current output format of metacoder::extract_taxonomy, but I would like to change its output to a classified object.
data <- classified(taxon_id = unite_ex_data_3$taxa$taxon_id,
parent_id = unite_ex_data_3$taxa$parent_id,
item_taxon_id = unite_ex_data_3$items$taxon_id,
taxon_data = data.frame(name = unite_ex_data_3$taxa$name,
rank = unite_ex_data_3$taxa$rank),
item_data = data.frame(seq_id = unite_ex_data_3$items$seq_id))
Note how the item_count column is calculated each time it is needed. This is because the number of items per taxon can change when taxa are subsetted.
head(taxon_data(data))
## taxon_id parent_id name rank item_count
## 1 1 <NA> Fungi k 500
## 2 2 1 Ascomycota p 180
## 3 3 2 Leotiomycetes c 32
## 4 4 3 Helotiales o 28
## 5 5 4 Hyaloscyphaceae f 25
## 6 6 5 Lachnum g 17
head(item_data(data))
## taxon_id seq_id
## 1 7 JQ347180
## 2 9 U59145
## 3 7 AM084756
## 4 7 FM172814
## 5 7 FN539058
## 6 10 AB481260
I already made a plot method for classified in metacoder that calls metacoder::plot_taxonomy. I also added some non-standard argument evaluation for the subsetting and plotting methods to save users some typing. Columns in classified_object$taxon_data can be refereed to by name alone.
plot(data,
vertex_size = item_count,
vertex_color = item_count,
vertex_label = name)
Note how this:
plot(data[name == "Ascomycota", ],
vertex_size = item_count,
vertex_color = item_count,
vertex_label = name)
… is the same as this:
data_subset <- data[data$taxon_data$name == "Ascomycota", ]
plot(data_subset,
vertex_size = taxon_data(data_subset)$item_count,
vertex_color = taxon_data(data_subset)$item_count,
vertex_label = data_subset$taxon_data$name)
The [[ currently corresponds to a more restrictive type of subsetting, in which the parents and children of taxa are not automatically preserved.
plot(data[[item_count > 5]],
vertex_size = item_count,
vertex_color = item_count,
vertex_label = name)
plot(data[[item_count < 200]],
vertex_size = item_count,
vertex_color = item_count,
vertex_label = name,
layout = "fruchterman-reingold")
However, it is still not working right as you might expect. I think this is because, when a child taxon is eliminated, its items are not transferred to the parent taxon. This is why some taxa in the above two plots have 0 items. Their children all together had > 5 items, but each child had < 5 items. I think the solution is to transfer the items of eliminated children to their non-eliminated parents, but I need to look into this more.